Orbiting binary black hole evolutions with a multipatch high order finite-difference 

approach 
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We present numerical simulations of orbiting black holes for around twelve cycles, using a high- 
order multipatch approach. Unlike some other approaches, the computational speed scales almost 
perfectly for thousands of processors. Multipatch methods are an alternative to AMR (adaptive 
mesh refinement), with benefits of simplicity and better scaling for improving the resolution in the 
wave zone. The results presented here pave the way for multipatch evolutions of black hole-neutron 
star and neutron star-neutron star binaries, where high resolution grids are needed to resolve details 
of the matter flow. 
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I. INTRODUCTION 

Mergers of binary compact objects (neutron stars or 
black holes) are expected to be the main sources of grav- 
itational waves for the ground-based interferometric de- 
tectors LIGO, GEO, VIRGO, and TAMA. Neutron star- 
neutron star and black holc-ncutron star binaries arc 
also interesting because they are leading candidates for 
explaining the production of short-duration gamma-ray 
bursts and because gravitational wave signals from these 
events may encode information about the neutron star 
equation of state P, [3, Q . Such a merger can be accu- 
rately modeled only by the numerical evolution of the 
full Einstein field equations coupled (if a neutron star is 
present) to an evolution of the neutron star matter. 

Because of advances in numerical relativity in recent 
years, stable evolutions can now be performed for most 
binary cases. Accuracy and speed are now the pressing 
numerical challenges: how to achieve the minimum error 
given limited time and computational resources. A good 
code should converge rapidly with increasing resolution 
to the exact solution. Its speed should scale well with the 
number of processors used in order to make good use of 
parallelization. Also, an efficient use of resources will re- 
quire a grid well adapted to the problem at hand. This in- 
cludes using a grid with the most appropriate shape. For 
example, it is reasonable to suppose that excision inner 
boundaries and outer boundaries should be spherical. A 
good grid will also use higher resolution where it is most 
needed. For example, although the grid must extend out 
into the wave zone to extract the gravitational wave sig- 
nal, lower resolution is needed in the wave zone than is 
needed in the vicinity of a black hole or neutron star. The 
need for high resolution in neutron stars and black hole 
accretion disks can become particularly acute in cases 
of hydrodynamic or magnctohydrodynamic instabilities, 
such as convective, Kelvin-Hclmholtz, or magnctorota- 



tional instabilities. In such cases, the length scale of the 
unstable modes can be much smaller than the radius of 
the star or disk, and the evolution will be qualitatively 
wrong if the instability is completely unresolved. 

One technique that has been successfully used to 
deal with this problem is adaptive mesh refinement 
(AMR) 0, [i]. These AMR codes generally use overlap- 
ping Cartesian meshes of varying levels of refinement, 
with the finer meshes being used only where they are 
determined (by some algorithm) to be needed. In this 
paper, we present a different method of achieving effi- 
cient grid coverage, one that is algorithmically simpler 
and that possesses some unique advantages. 

This different technique for evolving binary compact 
systems involves using multiple grid patches, each patch 
having its own shape, curvilinear coordinates and reso- 
lution. The basic ideas behind these multipatch meth- 
ods have been worked out in earlier papers [1, 0, [l]- 
In these references some particular patch configurations 
using cubes and cubcd-sphercs were used. The cubed- 
sphere patches were used to construct grids with exactly 
spherical inner excision boundaries and outer bound- 
aries. These methods are, hence, ideal for calculations 
that involve excision. (Using AMR with excision intro- 
duces a number of complications.) These techniques were 
then successfully used to simulate perturbed Kerr black 
holes d, ■ Multiple patches in cubed-sphere arrange- 
ment have also been used to evolve the shallow water 
equations pd| and to simulate hydrodynamic flows in 
black hole accretion disks [H, [H, [3| • 

Another multipatch approach has been used by the 
Cornell-Caltecli group to evolve Einstein's equations for 
binary black hole and black hole-neutron star [l^ 
systems. In the binary black hole case derivatives in these 
simulations are computed pseudospectrally, rather than 
using flnite differencing. While pseudospectral methods 
produce accurate results very efficiently for binary black 
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hole evolutions, they are much less cost effective for sys- 
tems involving matter. One reason for this is that the 
discontinuities that naturally appear in the fluid flow 
at shocks and stellar surfaces destroy the exponential 
convergence of spectral methods. In fact, the Cornell- 
Caltech group found it necessary to evolve the fluid vari- 
ables using finite differencing, while evolving the field 
variables pseudospectrally. This required two indepen- 
dent grids: the finite difference gridpoints used to evolve 
the fluid, and the collocation points of the pseudospec- 
tral code used to evolve the metric. For the two grids 
to communicate, variables had to be interpolated from 
one grid to the other each timestep, a process which con- 
sumed about one third of the CPU time in each simu- 
lation. Another problem with pseudospectral techniques 
is that they usually do not scale well to large numbers 
of processors. In regions without discontinuities, where 
spectral convergence is not lost, one cannot, for example, 
split one large domain into two domains with half the 
number of collocation points each without a signiflcant 
loss in accuracy. On the other hand, accurate simulations 
of binary neutron star or black hole-neutron star mergers 
are not practical without many processors. 

It would therefore seem preferable to evolve both the 
fluid and the metric with finite differencing. This could 
significantly improve the scalability, allowing simulations 
on hundreds or thousands of processors. It would also re- 
move the need for two separate grids and the expensive 
interpolation between them. Multipatch techniques are 
the natural finite difference version of the Cornell-Caltech 
pseudospectral evolution algorithm. As a first step in 
that direction, in this paper we evolve a binary black 
hole system using multipatches together with high order 
finite-differencing operators. We show that our code con- 
verges rapidly, scales well to thousands of processors, and 
can stably simulate several orbits of the inspiral. 



II. EVOLUTION EQUATIONS 

At the continuum level, the techniques used in this pa- 
per are exactly those ones previously used by the Caltcch- 
Cornell collaboration in binary black hole simulations. 
We use the first order form of the generalized harmonic 
system presented in The evolution variables in this 
formulation are the 4-metric gab and its first derivatives 
in space and time dcgab- (The indices run from to 3.) 
We use the constraint preserving boundary conditions of 
[13, [3- The evolution of the gauge is determined 
by the gauge source functions Ha — —g'^'^^acd, which are 
freely specifiable functions of space and time. In this pa- 
per, the gauge is set by choosing Ha to be constant in 
time in a coordinate system that comoves with the holes. 
This comoving coordinate system is determined using the 
same dual frame and control tracking mechanism as was 
used for the spectral binary black hole evolutions [20| . 
This technique uses two coordinate frames, which we la- 
bel and X*. The coordinate frame is set to be an 



asymptotically flat, inertial frame. All tensor compo- 
nents are evaluated with respect to this frame. The grid- 
points are fixed in the computational frame . By means 
of a mapping between the frames, the computational co- 
ordinates can be made to approximately comove with the 
system. For the runs in this paper, we track the binary 
using a simple combination of rotation and radial scaling: 

t = t 

X = a[x cos{6) — y sin{9)] 
y = a[x sm{6) + y cos(6')] 
z = az , 

where 9 and a are functions of time which are evolved 
using a feedback mechanism to keep the location of the 
black holes fixed in the computational domain. 

The differences between the simulations presented in 
this paper and the earlier spectral simulations the type 
of domain decomposition, and in the numerical tech- 
niques used to compute the right-hand sides of the evolu- 
tion equations (e.g. how spatial derivatives are approxi- 
mated). Our handhng of these issues is described below. 



III. INITIAL DATA 

The initial data that we use here consists of a snapshot 
at a given time of the highest resolution 16-orbit simu- 
lation done by the Caltech-Cornell collaboration, which 
corresponds to the run 30cl reported in Refs. [3, [2l[. 
The starting time i = in our simulations corresponds 
to the instant t = 2887 M of the 16-orbit simulation (with 
AI being the sum of the irreducible masses of each black 
holes). From that point, the black holes orbit for about 
6 orbits before merger, although our runs stop before the 
merger takes place. 

This way of specifying the initial data has the advan- 
tage that there is no junk radiation present in the compu- 
tational domain at our starting time. Since the domains 
and points used in this paper are different from those 
used in the spectral simulation, we spectrally interpolate 
the initial data to the multipatch domain. 

The outer boundary of our domain is a sphere of ra- 
dius r = 144 Af. This value is actually mapped to 
r' — 105 A/ by the dual-frame coordinate transforma- 
tion, which scales and rotates the inertial coordinates 
into the comoving ones. The coordinate transformation 
is a simple rescaling of the radial coordinate r' = a{t)r by 
a time dependent factor, and a rigid rotation about the 
z axis. Since the binary system has been evolving before 
our i = time, the scale factor has a value a = 0.727 and 
the rotation angle is 9 = 57.95 radians at the beginning 
of our simulations. The black hole coordinate separation 
at the beginning of the 30cl run is 14.44 M. At our time 
t = the initial coordinate separation is 10.5 M. 
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IV. MULTI-BLOCK DOMAIN 
A. Structure 

We use two types of basic building patches to cover 
the whole computational domain. One is simply a cuboid 
with a Cartesian coordinate map. The other is a combi- 
nation of six patches that we call a juggling ball. A jug- 
gling ball can assume two different configurations. The 
first of them is shown at the top of Fig. [T] It consists 
of a cube whose interior has been excised by a sphere. 
We will refer to it as an mner juggling ball because it is 
the one that we use to excise the interior of each black 
hole and to cover its immediate surroundings. The sec- 
ond configuration is shown at the bottom of Fig. [1] and 
consists of a sphere whose interior has been excised by 
a cube. We will call it an outer juggling ball because 
it is the one that covers the region away from the black 
holes, reaching to the outer boundary. Both types of jug- 
gling balls use a radial coordinate that adjusts smoothly 
to their geometry. Each surface of constant radial coor- 
dinate is endowed with six two-dimensional coordinate 
maps, in the same fashion as the cubed sphere (Tlj. In 
essence, the juggling ball is a collection of six patches, 
each of them topologically equivalent to a cube.^ 

The basic layout of the full domain used in this paper is 
shown in Fig. [2] The centers of the excised spheres (which 
will be inside each black hole) are located along the x 
axis at x =^ ±a. Here we have used two inner juggling 
balls, one around each black hole. Their individual outer 
boundaries are cubes with sides of length 2a. When they 
are put together, we end up with a cuboid of dimensions 
4a X 2a X 2a, with the longest side along the x axis. 
We surround this structure with six cuboid patches of 
dimensions 4a x 2a x a, aligning them along the y and z 
axes. After doing so, we end up with a cubical domain 
with sides of length 4a. To complete the patch system we 
add an outer juggling ball whose cubical interior holds the 
two inner juggling balls plus the six cuboids. The outer 
juggling ball enables us to shape the outer boundary into 
a sphere, in which case moving the boundary further out 
requires an increase in the number of grid points that 
scales as 0{N) (as opposed to 0{N'^)). 

The total number of patches in this basic configura- 
tion is 6 cuboidal patches 4- 6 x (3 juggling balls) = 24 
patches. 

None of the patches used in this paper overlap with any 
other (in which case they are usually called blocks). A 
given block communicates with adjacent ones only by the 
two-dimensional common surface between them. Accord- 
ingly, we handle parallelization by assigning one block 
per processor, in this way minimizing communication be- 





^ The name juggling ball was chosen because some real juggling 
balls have a set of six quadrilateral-shaped designs on their sur- 
face. 



Figure 1: Equatorial cross-section of an inner juggling ball 
(top). Black lines denote the block boundaries. Colored lines 
represent the coordinate grid of each block. Equatorial cross- 
section of an outer juggling ball (bottom). 



twecn processors. 

In this basic 24-block domain case, we would use ex- 
actly 24 processors, which is a fairly small number for 
a binary black hole simulation. In order to achieve 
higher resolutions by increasing the amount of points 
per block, we subdivide the existing blocks into smaller 
pieces. Since the topology of each block is cubical, it is 
straightforward to subdivide them. The guiding principle 
that we use to accomplish the subdivision is to keep the 
same number of points per block for every single block. 
Although this condition is not necessary, it is convenient 
because it balances the computational load across all the 
processors. 

For the runs presented here, we used 192- and 384- 
block domains. The first case is obtained by subdividing 
the inner juggling balls uniformly in the radial direction 
7 times. The 6 cuboids are split by a factor of 2 and the 
outer juggling ball is divided 4 times in the radial and 
twice in each transverse direction. The 384-block case is 
derived from the 192-block one by further split of each 
block in the radial direction by a factor of 2. 

Figure [3] shows the multipatch structures used in this 
paper. 



Figure 2: Equatorial cut of the computational domain (top). 
Sdiematic figure showing the direction considered as radial 
(red arrows) for the cuboidal blocks (bottom). 



B. Numerical techniques 

In our simulations wc use the I?8-4 summation-by- 
parts (SBP) finite difference operator and its associated 
dissipation constructed in [3] . The naming convention is 
meant to indicate that the derivative is 8th order accurate 
in the bulk of each block but only 4th order accurate near 
inter-block boundaries. The derivative in the interior of 
each block is a centered one and is modified near bound- 
aries so as to satisfy the SBP property with respect to 
a diagonal norm; this is the cause of the drop in conver- 
gence. Information across sub-domains is communicated 
using characteristic variables and a penalty method (see 
0,013 for more details). 

The combination of these techniques guarantees nu- 
merical stability, but at the expense of the drop in conver- 
gence order near boundaries. For example, in the I?8-4 
case there are eight points near each boundary where the 
scheme is fourth order. For technical reasons explained 
below, in the simulations of this paper we use a rather 
large number of blocks and processors (192 and 384), 
with a very small load on each. As a consequence, the 
scheme is fourth order nearly everywhere and we expect 
our simulations to be 4th order convergent. This is in- 



Figure 3; Computational domain used in the simulations of 
this paper 

deed what our simulations below show. 



C. Resolution 

One of the features that a multipatch method offers is 
the flexibility to increase only the radial resolution while 
keeping the angular resolution constant. Given that the 
angular profile of the waveforms is dominated by a few 
low-^ modes, once a sufficient angular resolution is used 
the truncation error will be dominated by the radial res- 
olution. 

The approximate spherical symmetry in the vicinity of 
each black hole and at large distances from them allows 
the radial direction to be naturally defined for each jug- 
gling ball block. However, for the cuboidal blocks there 



5 



is some arbitrariness in how to choose the radial direc- 
tion. In practice, a radial direction for these blocks is 
useful only to define the direction along which resolution 
will be increased. In Fig. [5] the radial directions for the 
cuboidal blocks are indicated with arrows. 

We use an angular resolution of 7r/58 around each black 
hole and twice as much in the outer blocks. That is, 
there are 116 points along an equatorial line around each 
black hole and twice that number in the distant wave 
region. This is somehow inefficient since the solution is 
over-resolved in the angular directions compared to the 
radial one (especially in the wave region). The moti- 
vation behind this choice was to allow the grid points at 
the boundary faces of adjacent blocks to be in one-to-one 
correspondence with each other. In this way the commu- 
nication of the characteristic modes at the inter-patch 
boundaries does not require interpolation. 

In Table [T] wc show the total number of points in the 
whole domain and per block for the simulations of this 
paper. We increase resolution only along the radial di- 
rection, by the same number of points in all the domains. 
In our setup all blocks have the same number of points. 
Since parallelization is handled by assigning one block 
per processor, this guarantees a homogeneous load dis- 
tribution. 

The number of points shown in Table [I] is actually not 
large for a fully three-dimensional (i.e. no symmetries im- 
posed) finite-difference simulation. For example, we can 
compare these numbers to a binary black hole evolution 
with around the same number of orbits using Cartesian 
grids and adaptive mesh refinement . A typical state- 
of-the-art simulation uses six refinement levels around 
each black hole with 64'^ points on each level, and four 
coarse grid levels with 128^ points. This amounts to a 
total of 2 X 6 X 64^ -h 4 X 128^ w 226^ points. In the case 
of non-spinning, equal-mass black holes one can make 
use of the symmetry of the problem and reduce the total 
number of points to 6 x 64^ -I- 128^ « 154^. 

We have tested the performance of our multipatch par- 
allelization scheme for the evolution of a single black hole. 
In Fig. m we show a strong scaling test for up to 3, 000 
processors (cores), in which the total number of points 
is kept fixed while increasing the number of processors. 
We see that the speed of the code has a linear depen- 
dence on the number of processors. Similarly, in Fig. [5] 
we show a weak scaling test, where the load per proces- 
sor is kept fixed while increasing the number of processors 
used. The drop in speed in this case is about 15% over a 
range of 10 to 3,000 processors. We have not attempted 
to go beyond this number of cores. 

The phase errors in the waveforms shown in the next 
section are rather large compared to state-of-the-art sim- 
ulations (in particular, compared to an AMR one such as 
the one mentioned above). Since the code scales well and 
the number of points used in this paper (shown in Ta- 
ble [l| is reasonable for a finite-difference evolution, in 
principle wc could improve the accuracy of the simula- 
tions shown in the next section while still using modest 



Nr X Afang X iVbiockB = Ai'totai Speed (h~^) CPU (h) 

19 X 29^ X 192 = 145^ 2.83 67844 

22 X 29^ X 192 = 153^ 1.86 103226 

16 X 29^ X 384 = 173^ 2.42 158678 

Table I: Speed and CPU time for three resolutions. Nr and 
A'^ang are the radial and angular number of points per block, 
respectively, as described in the text. The speed is expressed 
in units of the total irreducible mass per hour. 
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Figure 4: Strong scaling test for a single black hole. The 
speed of the code depends essentially linearly on the number 
of processors, almost perfect scaling. 

computational resources. What has prevented us from 
doing so is a purely technical obstacle. The computa- 
tional infrastructure used in this paper, SpEC, was orig- 
inally designed for pscudo-spcctral evolutions, which are 
extremely efficient in terms of memory. For that reason 
SpEC currently stores in memory many more variables 
than are actually needed for evolving the system. As a 
result, in our ED simulations because of memory con- 
straints we actually end up using a few cores per node 
and a rather large number of nodes. We plan to improve 
SpEC's use of memory soon to eliminate this limitation. 
However, for the demonstrations in this paper, the cur- 
rent resolutions are sufficient. 



V. RESULTS 

Figure [5] shows the location of the ccntroids of the 
apparent horizons for the highest resolution simulation. 
The black holes complete about six orbits before reaching 
the merger regime. 

A. Convergence of the constraints 

A way of checking the consistency of the numerical so- 
lution is monitoring the constraint violations, since they 
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Figure 5: Weak scaling test for a single black hole. There 
is only a 15% drop in speed as the number of processors is 
increased while keeping the load per processor fixed. 




Figure 6: Black hole orbits. 



are not enforced during the evolution. In Fig. [7|we plot 
the norm of all the constraint fields of the first order 
generalized harmonic system, normalized by the norm 
of the spatial gradients of the dynamical fields, as defined 
in We show three runs with different resolutions. 

Figure [S] shows the convergence exponent of the 
norm of the normalized constraint violations, which is 
around four, as expected (cf. Sec. lIVBp . The conver- 
gence exponent n is defined as 



C2 — C3 



(2) 



Figure 7: L norm of the normalized constraints. 



where a is the ratio between the medium and coarse 
resolution and /3, the ratio between the fine and coarse 
one. Ci, C2, and C3 represent a given quantity at coarse, 
medium and fine resolutions, respectively. 

The uniform convergence is lost around t ^ 800 M , at 
which time the values for the coarse and medium resolu- 
tions intersect, as is seen in Fig.H 

We stop our simulations when the characteristic speeds 
at the excision boundary change sign, which means that 
there is spurious information entering the domain. That 
moment is characterized by a blow-up of the constraints. 
This feature is due to the inadequacy of the rather simple 
gauge conditions used here at times close to merger. At 
the time the simulations of this paper were performed, 
we used the same simple conditions used then by the 
Caltech-Cornell collaboration, namely, keeping the gauge 
source functions fixed in the comoving frame. Since then, 
better conditions have been developed, which do allow 
simulations to go through merger and ringdown [Tsl . For 
the purposes of this paper, however, following six orbits 
of an inspiral is sufficient. 



B. Waveforms 

Waveforms are computed via the Newman-Penrose 
curvature scalar ^'4 as in [23| . Subsequently we 
decompose ^4 in spin- weighted spherical harmonics 
-2Ytm{(), (/>)■ We focus our discussion to the £ = 2, m = 2 
mode. The extraction is done at r = 50 AI. 
Figure [HI shows the real component of ^'4 
We see that they all agree at early times and drift 
apart during the later stages of the evolution. A more 
meaningful comparison is shown in Figs. llOl and llll where 
we plot amplitude and phase of the extracted wave. The 
differences between the finite differences waveforms and 
the spectral one are shown in Figs. [T^] and [12] for the 
amplitude and phase, respectively. 
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Figure 8: Convergence exponent for the L norm of the nor- 
mahzed constraints. 



Figure 11: ^'4 phase for the finite difference and spectral re- 
sults. 
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Figure 9: Real part of ^4 for the finite difference and spectral 
results. 



Figure 12: Differences in the ^'4 amplitude between the finite 
difference and the spectral results. 
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Figure 10: *E'4 amplitude for the finite difference and spectral 
results. 



Figure 13: Differences in the <I'4 phase between the finite 
difference and the spectral results. 
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